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Abstract. This note is devoted to an analysis of the so-called peeling algorithm in 
wavelet denoising. Assuming that the wavelet coefficients of the signal can be modeled 
by generalized Gaussian random variables, we compute a critical thresholding constant 
for the algorithm, which depends on the shape parameter of the generalized Gaussian 
distribution. We also quantify the optimal number of steps which have to be performed, 
and analyze the convergence of the algorithm. Several versions of the obtained algo- 
rithm were implemented and tested against classical wavelet denoising procedures on 
benchmark and simulated biological signals. 



1. Introduction 

Among the wide range of applications of wavelet theory which have emerged during 
the last 20 years, the processing of noisy signals is certainly one of the most important 
one. Especially attractive to the community has been the thresholding algorithm, and the 
great amount of efforts in this direction is well represented by the enthusiastic discussion 
in [7], by the application oriented presentation pQ or by the sharp uniform central limit 
theorems in [S]. This fundamental algorithm can be summarized in the following way: 
recall that the wavelet decomposition of a function z G L^(M) is usually written as: 

z{t) = aj^k(pjok{t) + YY1 f^Jk^jkit), (1) 

fc=0 j=jo k=0 

where the coefficients a,f] are obtained by projection in L^(M): 



«jofc = {<Pjok, z) L2(R) > and f3jk = {ipjk, z) 



The functions '0 and cf) are respectively called mother and father wavelets, and enjoy 
some suitable scaling and algebraic properties (see e.g. O |T2] for a complete account on 
wavelet decompositions). In this context, the thresholding algorithm assumes that, if z 
can be decomposed into z = x + where x is the useful signal and n its noisy part, 
then the wavelet coefficients corresponding to n will typically be very small. A reasonable 
estimation for the signal x is thus: 

2^0-1 J 2^-1 

X{t) = ^ aj^k l{|a,ofc|>r} <Pjok{t) + X] 5Z '^i\f^Jk\>r} ^jk{t), 
k=0 j=jo k=0 
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where r is a suitable threshold (which may also depend on the resolution j and, in practice, 
is often null for the coefficients aj^ of the father wavelet / scale function) and where J 
corresponds to the maximal resolution one is allowed to consider. It is then proved in the 
aforementioned references [H [7] that this kind of estimator satisfies some nice properties 
concerning the asymptotic behavior of the approximation error, in terms of the total 
number of wavelet coefficients (which is denoted by in the sequel). 

One of the drawbacks of the thresholding algorithm is that it may also spoil the origi- 
nal signal X. The critical issue is the value of the threshold(s) r: too low it is inefficient, 
too high it distorts the information from x. In order to improve the performances of 
wavelet-based denoising algorithms by adapting them to the processed signals, the fol- 
lowing iterative method, called peeling algorithm, has been introduced and shown to be 
particularly useful for biomedical applications in [H [H]. It still relies on an a priori de- 
composition of the observed signal z into z = x + n, where x is the signal itself, and n 
is a noise. The algorithm intends then to separate x from n iteratively, and the k^^ step 
of the procedure produces an estimated signal Xk, and a noise Uk, initialized for k = 
as uq = z. These functions will always be assimilated with the vector of their wavelet 
coefficients. Then the [k + 1)**^ step is as follows: 

II II 2 

(1) Compute al = , where we recall that denotes the total number of wavelet 
coefficients involved in the analysis. 

(2) Set a thresholding level Tk+i as Tk+i = h{ak), where h is usually linear, which 
means that T^+i = F at for a certain coefficient F. 

(3) Compute Ax^+i as: 

AXfc+l(g) = nk{q) l{|n,,(g)|>T,+i}, 

for all the coefficients q of the wavelet decomposition. The vectors Xfc+i, rifc+i are 
then defined as Xk+i = Xk + Ax^+i, and n^+i = — Ax^+i. 

(4) Loop this procedure until a stop criterion of the form — ll^^fe+iP < £^ is 
reached, for a certain positive constant e. Notice that one can choose e = 0. 

This iterative procedure tends to retrieve a higher quantity of (approximate) signal x from 
the noisy input z, correcting some of the failures of the original thresholding algorithm in 
some special situations. 

On the basis of these promising experimental results, the peeling algorithm has been 
further investigated in [HI [IS], and it has been first observed in those references that the 
peeling problem could be handled through a fixed point algorithm. This possibility stems 
basically from the fact that the sequence {T^; /c > 0} is decreasing (as Unfcp > ||nfc+ip), 
which means that the previous algorithm can be reduced to the following: 

(1) Set To = +00 and T^+i = /Ar(Tfc), where /a? is of the form: 

2^q<N^ y1) ^{\z{q)\<x} 



fN{x) 



N 



(2) 



For a suitable constant F, this defines a converging decreasing sequence (T^), such 
that lim/j^oo Tj. = Tf > 0. 
(2) Stop the loop when T^+i = Ty, and then set 

xil) = z{q)l{\z{q)\>Tf}- (3) 
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It is shown in [T3] that this algorithm is almost surely convergent, and a further analysis 
of the coefficient F is performed in [15] . 

However, in spite of the efforts made in the aforementioned references [IHII5], a prob- 
abilistic analysis of the algorithm is still missing. The current article proposes to make 
a step in this direction, and we proceed now to describe the results we have obtained. 
First of all, let us say a few words about the model we have chosen for our signal z. This 
signal is of course characterized by the family of its wavelets coefficients, which will be 
denoted from now on by {z{q)\ q < N}, and it is usual in signal processing to model these 
coefficients by independent generalized Gaussian variables (see e.g. [13]), all defined on 
a common complete probability space JF, P). To be more specific, we will assume the 
following: 

Hypothesis 1.1. The wavelet coefficients {z{q)\ q < N} of our signal z form an i.i.d 
family of generalized Gaussian variables, whose common density {Pa,u{x))xm is given by 

p.,.(x)=ae-l^-l", wUh f3=-(^^Y\ ^ = ^^ivr-V 

a \V{l/u)J 2V{l/u) 

where V stands for the usual Gamma function = e^^x^^^dx. Notice that the 
coefficient a > above is the standard deviation of each random variable z{q), and that 
u > represents the shape parameter of the probability law (u = 2 for the Gaussian, u = 1 
for the Laplace pdf). 

It should be stressed at this point that this model does not take into account the 
possible decomposition of z into a signal plus a noise, since we model directly the wavelet 
coefficients of z. It is however suitable for the main example we have in mind, namely a 
situation where the family {z{q)] q < N} is sparse. Indeed, when n < 2 in expression (jl]), 
the distribution of the z{q)^s becomes heavy tailed, which means that one expects a few 
large coefficients and many small ones. This is the situation we are mostly interested 
in, but our analysis below is valid for any coefficient n > once our basic model is 
assumed to be realistic. It should also be stressed that in the end, our algorithm is also of 
thresholding type, as may be seen from equation Q. This means in particular that it is 
certainly suitable to retrieve signal from a noisy input, on the same basis as the original 
thresholding algorithm. 

With these preliminary considerations in mind, here are the two main results which 
will be presented in this paper: 

(1) We have seen that the sequence of thresholds {T^; k > 0} involved in the peeling 
algorithm converges almost surely. However, it is easily checked that it can converge 
either to a strictly positive quantity Tj, either to 0. This latter limit is not suitable for 
our purposes, since it means that no noise will be extracted from our signal. One of the 
main questions raised by the peeling algorithm is thus to find an appropriate constant F 
in ([2]) such that (i) The algorithm yields a convergence to a non trivial threshold Tj > 0. 
(a) F is small enough, so that a sufficient part of the original signal is retrieved. 

The previous attempts in this direction were simply (see [9]) to take F = 3a with 
experimental arguments; after the analysis performed in [15], this quantity was reduced 
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to F 



Fm, a quantity which is defined by 



F 



3r(i/u) 




(5) 



m 



U 



However, the latter bound has been obtained thanks to some rough estimates, and we 
have thus decided here to go one step further into this direction. Indeed, our first task will 
be to determine precisely, and on a mathematical ground, a constant Fc = Fc{u, a) such 
that: if F > Fc, the algorithm yields a convergence, with high probability, to a strictly 
positive constant Tf = Tf{uj), whose fiuctuations around a typical non-random value x* 
will be determined. In particular, we will see that our constant Fc is always lower than 
Fm- Whenever F < Fc, we also show that Tk converges to with high probability (see 
Proposition 13. 5p . 

(2) In the regime F > Fc, we determine that the optimal number of steps for the peeling 
algorithm is of order log(A^), where we recall that is the total number of wavelet 
coefficients involved in the analysis. After this optimal number of steps. Theorem 13.31 
quantifies also sharply the oscillations of Tf with respect to its theoretical value rrif. 

It is important to show that our theoretical results can really be applied to real data. 
We have thus decided first to compare the performances of our algorithm with other 
wavelet denoising procedures, on some classical benchmark signals proposed in [6]. It will 
be seen that our algorithm performs well with respect to other methods, independently 
of the value of the shape parameter in (jl]) and of the form of the benchmark signal. 
Interestingly enough, this assertion is true even if Hypothesis 11.11 is not always satisfied 
by the benchmark signals under consideration. 

A second step in our practical part of the study is the following: since the peeling 
algorithm has been introduced first in a medical context, we give an illustration of its 
performances on EGG type signals. More specifically, we shall consider a simulated EGG 
signal, and observe the denoising effect of our algorithm on a perturbed version of those 
electrocardiograms. It will be observed again that the algorithm under analysis is a good 
compromise between denoising and preservation of the original signal. 

Let us mention some open problems that have been left for a subsequent publication: 
first, let us recall that the so-called block thresholding has improved the behavior of 
the original thresholding algorithm in a certain number of situations (see e.g. [2] for a 
nice overview). It would be interesting to analyze the effect of this procedure in our 
peeling context. In relation to this problem, one should also care about some reasonable 
dependence structure among wavelet coefficients, beyond the independent case treated in 
this article. Finally, we have assumed in this paper that the parameters of the distribution 
Pa,u were known, which is typically not true in real world applications. One should thus 
be able to quantify the effect of parameter estimation on the whole denoising process. 

Here is how our article is structured: we show how to compute optimal constants for 
the peeling algorithm at Section [2l Then the probabilistic analysis of the algorithm is 
leaded at Section [31 Finally, some numerical experiment on simulated and pseudo-real 
data are performed at Section 4. 
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2. Critical constants for the peeling algorithm 

This section is devoted to the computation of an optimal constant F in equation ([2]), 
ensuring a convergence of the threshold Tk to a non trivial Tj, and still allowing to retrieve 
a maximal amount of approximate signal from our noisy input z. 

Let us start this procedure by changing slightly the setting of the peeling algorithm. 
Indeed, it will be essential for our convergence theorems at Section [31 to be able to express 
the fixed point algorithm in terms of empirical processes. To this purpose, we resort to a 
simple change of variables by setting: 

Uu=Tl and Y{q) = z{qf. 

Note that Uq = +oo. It is then readily checked that the fixed point algorithm of Section [T] 
is equivalent to the following: 

(1) Uk+i = gN^Uk), where g^ is of the form: 

9Nix) = ^ X! Y{q) l{Y{q)<x}- 

q<N 

For a suitable constant F, this defines a converging decreasing sequence (Uk), such 
that limfc„,oo Uk = Uf. 

(2) Stop the loop when Uk+i = Uf, and then set 

£(g) = ^(g)l{|.(,)l> /Z^l- 



The fixed point Uf = Uf{uj) is then solution of the equation gN{x) = x. We wish to find 
the critical (minimal) F which ensures Uf to be strictly positive. 

A preliminary step towards this aim is to consider a natural deterministic problem 
related to the equation gN^x) = x. Indeed, for a fixed value of x G M+, the law of large 
numbers asserts that the random variable gN{x) converges almost surely to the quantity 



X 



ga,u{x) = FM wp„^u{w) dw, 







where is the common density of the random variables Y{q) = z{qY, given explicitly 
under Hypothesis 1 1 . 1 1 by 

Pa,u{,w) = ^ — l{«,>o} = "-^^ ' 1{^>0}- (6) 



w ^Jw 

This result is only a simple convergence result, and not an almost sure uniform convergence 
of giq towards g„^u- However, as will be shown in the Section [31 the fixed point f// is close 
in some sense to a fixed point of g„^u- Therefore, our study of the fixed points oi g^ can 
be reduced to the study of g^^u- Our aim is now to give some sharp conditions on the 
coefficient F ensuring that the equation ga,u{x) = x has at least one solution x > 0. 

According to ([6|) we obtain, for x > 0: 



Jo 
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1 /2 

with P = ^ ( r(i/!!) ) ^ ~ 2r{i/u) ■ Furthermore, a simple change of variables argu- 

ment yields: 

gaA^) = FV2rinc((/5v^)", 3/m) (7) 

where Ti^c{x,a) = e'^f^'^dt is the incomplete Gamma function. 

Observe that the trivial change of variables w = a^y in the integral above yields the 
expression: 

ga,u{^) = O^V,n(a;/cr^) (8) 

Hence, solving ga,u{x) = x is equivalent to solve gi,ii{v) = v, for v = xjo^. We shall 
consider our equation in this reduced form, since a has only a scale role in the fixed point 
problem of g^^^ and can be omitted in the study. In the sequel, we thus solve the problem 
in its reduced form: 5'i,„(x) = x. Furthermore, for notational sake, we simply denote g\^u 
by g- 

The resolution of the equation g[x) = x boils down to the joint study of g and of 
a function d defined by d{x) = g{x) — x. These studies are a matter of elementary 
considerations, and it is easily deduced that g has the following form, as a function from 
M+ to M+: 

(1) g IS increasing and lim2,.^oo g{x) = F"^ = g^. 

1 /2 

(2) g is convex on [0, /^'^u"^/""] and concave on [/J'^n"^/", +oo), where (3 = ^ r(^/„j j 

More precisely, it is easy to prove the existence of a critical value such that: 

(1) If F < Fc, the only fixed point of g is 0. 

(2) If F = Fc, g has exactly two fixed points (0 and x* > 

(3) If F > Fc, g has exactly three fixed points (0, li, and x*), such that < li < x* 
and < x*. 

These facts are well illustrated by Figure 1 (for cr = 1). 



super— critical : F— 1.15 Fc 
critical : F=Fc 




Figure 1: Curves corresponding to g, in the critical case {F = Fc), and in supercritical 
and subcritical cases {F = l.lSFc and F = 0.9Fc), for cr = 1 and u = 2. 



Let us turn now to the computation of the critical coefficient Fc and the critical fixed 
point x*. In fact, once the study of our function d is performed, it is also easy to show 
that F = Fc and r = x* are solutions of the system: 

g'{r) = 1, and g{r) = r. 
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u 


0.1 


0.5 


1 


2 


3 4 


Fc 


4.0215 


2.7830 


2.42537 


2.16169 


2.0472 1.98181 



Table 1: Critical constant for different shapes u. 

where we recall that the coefficient F enters into the definition of g. This system is 
equivalent, in the generalized Gaussian case, to: 




where it should be reminded that T and Fine designate respectively Gamma and incomplete 
Gamma functions. The latter system can be solved with the Mathematica software, and 
the solutions for different u are illustrated in Figure [2l Some typical values of Fc in terms 

F 

4.0 



3.5 
3.0 
2.5 
2.0 




Figure 2: The values of critical F^ and F^ for a = 1 and u G [0, 4]. 



of u are also given in Table [TJ In particular, it can be observed that Fc is smaller than 
the bound F^ proposed by [15], which has been recalled at equation ([5]). 

3. Probabilistic analysis of the algorithm 

3.1. Comparison Noisy dynamics/ Deterministic dynamics. The exact dynamics 
governing the sequence {Un, n > 0} is of the form Un+i = QNiUn)- In order to compare 
this with the deterministic dynamics, let us recast this relation into: 

Un+i = g{Un) + en,N, where en,N = gN{Un) - g{Un). 

Notice that the errors tv are far from being independent, which means that the relation 
above does not define a Markov chain. However, a fairly simple expression is available 
for Un- 

Proposition 3.1. For n > 0, set g°^ for the n*^ iteration of g. Then, forn > 0, we have: 

n—l n~p 

Un=g°''{Uo)+Rn, with Rn = J2^P,NY[9{Cp+q), 

p=0 q=2 

where the random variable Cj (j > 2) is a certain real number within the interval 
[g°^-'^^\Uo); Uj^i]. In the definition of Rn, we have also used the conventions 11^=2'^'? ~ -'- 
and Ro = 0. 
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Proof. It is easily seen inductively that Rq = 0, i?i = Eq^n and for n > 1 
Hence, by a backward induction, we obtain: 

n j—2 n—1 n—p 

Rn = ^^n-j,Ar ]^fl''(C'n-«) = ^p,N W g'{.Cp+q), 

j=l 1=0 p=0 q=2 

which ends the proof. □ 

A useful property of the errors Ep^N is that they concentrate exponentially fast (in terms 
of A^) around 0. This can be quantified in the following: 

Lemma 3.2. Assume that the wavelets coefficients are distributed according to a general- 
ized Gaussian random variable with parameter u > 0, whose density is given by Q), and 
recall that F is defined by equation Then for every < 7 < there exists a 

finite positive constant K > such that for all N > 1 and for all A G [0, 'jN'^^^], 

E Ul-P,ivl"/'l < K_ (9) 

Moreover, for all N > 1, for all p > and I > 0, 

P(kp,^|>0<^e-^'"'''^"'*. (10) 
Proof. Recall that Ep^N' is defined by: 

£p,N = 9n{Up) - g{Up) = ^ 5Z Y{j) l{Y{j)<Up} - 9{Up 

\j=i 

for a collection {Y{i); i < N} of i.i.d random variables, where Y{i) can be written as 
Y{i) = z{iY and z{i) is a generalized Gaussian random variable with parameter m > 0, 
whose density is given by (jlj). For a fixed positive x, the fluctuations gN^x) — g{x) are 
easily controlled thanks to the classical central limit theorem or large deviations principle. 
The difficulty in our case arises from the fact that Up is itself a random variable, which 
rules out the possibility of applying those classical results. However, uniform central limit 
theorems and deviation inequalities have been thoroughly studied, and our result will be 
obtained by translating our problem in terms of empirical processes like in |16] . 

In order to express Ep^N in terms of empirical processes, consider x E [0, 00] and define 

h^:R+ ^ R+ by h^{u) = F'^u l{u<x}- Next, for / : R+ ^ M, set 

1 ^ 

^Nf = J;^,Y.^f{Y{^)) - E[fiYm], 
1=1 

and with these notations in mind, notice that 

1 ^ 

i=l 

It is now easily seen that 

ep,N = N-'^'G^hup, 

and the key to our result will be to get good control on Gat/ix in terms of A^, uniformly 
in X G [0, 00]. 
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Let us consider the class of functions Q = {h^; x G [0, +00]}. According to the termi- 
nology of [16] , the uniform central limit theorems are obtained when ^ is a Donsker class 
of functions. A typical example of Donsker setting is provided by some VC classes (see 
[T6| Section 2.6.2]). The VC classes can be briefly described as sets of functions whose 
subgraphs can only shatter a finite collection of points, with a certain maximal cardinality 
M, in M^. For instance, the collections of indicators 

T = {i[o,x); X e [0, +00]} . 

is a VC class. Thanks to [HI Lemma 2.6.18], our class Q is also of VC type, since it can 
be written as 

g = j^.h = {fh-feJ^}, 

where h : M+ is defined by h{u) = hooiu) = F'^u. 

In order to state our concentration result, we still need to introduce the envelope Q of 
which is a function Q : M defined as 

Qiu) = sup{/(n); feg},ue R+. 

Note that in our particular example of application, we simply have Q = h. Let us also 
introduce the following notation: 

ATIGn; G, a, m] = E* [e^"'P/es l^^/r] , and Af[h; X,m]=E [e^l'^^^^l'"] , 

where E* is the outer expectation (defined in [15] for measurability issues), Y is the square 
of a generalized Gaussian random variable with parameter m > 0, A > and m > 0. 

Then, since ^ is a VC class with measurable envelope, ^ is a Donsker class and [TB| 
Theorem 2.14.5 p. 244] leads to: 

J\f[GN] G, A, m] < cAf[h; A, m], 

with c a finite positive constant which does not depend on A^, A and Q. Furthermore, since 
Y is the square of a generalized Gaussian random variable with parameter u, it is readily 
checked that 

Af[h; A, m] < 00 

for A small enough (namely A < {(3 / F)'^'') and m = u/2. Recalling now that Sp^N = 
N^^^'^Gj^fhup, we have obtained: 

< Af[GN; Q, A, u/2] < cM[h- 7, n/2] = A' < 00 

for A < 7 < {f3/F)^, which easily implies our claim (Q. 

Let / > 0. Then, 

P(kp,^| > /) = P (e^^"''l^--l"'' > e^'"''^"'') . 

The concentration property (ITOj) is thus an easy consequence of ([9]) Markov's inequality. 

□ 



E 
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3.2. Supercritical case: F > Fc- In this section, we assume tliat F > F^. Tlien it is 
easily deduced from the variations of d given above that gi^u = 9 has the following form, 
as a function from M_|_ to ]R_|_ (see Figure 1): 

(1) g is increasing and lim^-^oo fl'(3;) = F"^ = g^. 

(2) g has exactly two fixed points apart from 0, called ii and x*, with ii < x*. 

(3) There exists £2 G {ii,x*) such that g'ih) < 1 and g is concave on [£2, 00). 

(4) Let 5 > such that h < I2 + 5 < x*. Then, g{l2 + 5) > k + S. 

With these properties in mind, we can study the convergence of the deterministic 
sequence n > 0} defined recursively by xq = 00 and Xn+i = g{xn)- Indeed, it is easily 
checked that Xn is decreasing to x* as n — > 00. Furthermore, let M^* = sup{g'{x); x > 
X*} = g'{x*), and recall that M^* < 1. Then 

\Xn+l -X*\= Xn+l -X* = g{Xn) - g{x*) < M^. {Xn - X*) , 

which means that a geometric convergence occurs: inductively, it is readily checked that, 
for n > 1: 

\xn-x*\<M:r'{goo-x*), (11) 
where we recall that goo = linix^oo 5'(a^)- 

We are now ready to prove the convergence result for the peeling algorithm, in terms 
of a concentration result for the noisy dynamics around the deterministic one: 

Theorem 3.3. Assume F > F^ (where these quantities are defined at Section and 
that the wavelets coefficients are distributed according to a generalized Gaussian random 
variable with parameter u > 0, whose density is given by Q). Let a < 1/2, C = ,) +V 
with r] > 0. For any N G N*, let n = n{N) = [Ca In A^] + 1. Then, there exist A, 7 two 
positive finite constants such that for all N &W , and any F lying in an arbitrary compact 
interval [0,Fo], we have 

Remark 3.4. This theorem induces three kind of information about the convergence of 
our algorithm: (i) For a fixed number of wavelet coefficients A^, the optimal number 
of iterations n for the peeling algorithm is of order ln(7V). (ii) Once n is fixed in this 
optimal way, f/„ is close to the fixed point x* of g, the magnitude of |f/„ — | being of order 
j\r--(i/2-e) fQj^ g^j^y 5 > 0. (Hi) The deviations of Un from x* are controlled exponentially 
in probability. 

Proof of Theorem \3.3[ Observe first that, owing to Proposition 13.11 and inequality fllip . 
we have 

\Un -x*\ = (f/o) - X* - < M:r\goo - X*) + |K| , 
for any n > 1. Let then 6 > and let us fix n > 1 such that 

M^r'igoo - X*) < (12) 
i.e. n > 1 + ln{6/(2goo — 2x*))/ ln(Ma;*). Then it is readily checked that: 

(13) 
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and we will now bound the probability in the right hand side of this inequality. To this 
purpose, let us introduce a little more notation: for n, /c > 1, let flk be the set defined by 

nk = {Loe n; inf {j > / Uj <i2 + S} = k}, 
and set also Qn = Ufc=i ^fc- Then we can decompose ( IT3|) into: 

F{\Un-x*\>5)<¥{n^)+¥\nin{ \r^\ > ^ H . (i4) 



We will now treat these two terms separately: 

Step 1: Upper bound for P(r2„). Let us fix A; > 1 and first study P {^k)- To this purpose, 
observe first that 

c{Uk<l2 + S< Uk-i} . 

Recall that £2 + 6 satisfies g{i2 + S) > £2 + Hence, since Uk = QNiUk-i) and invoking 
the fact that g is an increasing function, the following relation holds true on 0,^. 

QNiUk^i) <l2 + 5 and g{l2 + 5) < giUk^i). 

We have thus proved that 

^k C {gNiUk^i) - giUk^i) <l2 + 5- gih + 6)} , 

where I2 + S — g{l2 + 5) = —L < 0. Since gNiUk~i) — giUk^i) = £k-i,N by definition, we 
end up with: 

P(fife) <P(|£fc_l,jv| > L). 

A direct application of Lemma [3.21 yields now the existence of 7, G (0, 00) such that 
for all A; > 1 and all > 1 

F{Qk) < i^'e-^^"''^"'*. 

Hence 

n 

< < J^ne-^^"''^"''. (15) 

k=l 

Step 2: Upper hound for P(f2^ fl {|-R„| > |}). We have constructed the set (in so that, 
for all 2 < < n + 1, the random variables Ck introduced at Proposition 13.11 satisfy 
< ^' (Cfc) < p < 1 on f]^. Thus 

' n—l ? \ / n—l 





where we have set 



z/ = ^ ^, and M„ ? = — ^ 

so that {up] 0<p<n — l}isa probability measure on {0, . . . , — 1}. 

We introduce now a convex non- decreasing function a„ which only depends on the shape 
parameter u, and which behaves like exp(x"/^) at infinity. Specifically, if m > 2, we simply 
define a„ on IR+ by 

aJx) = e 



12 



C. LACAUX, A. MULLER, R. RANTA, AND S. TINDEL 



When n < 2, setting s„ = (2/ti — 1)^^", then x i— > exp(x"''^) is concave on [0, s^] and 
convex on [su,+cx3). Then, we modify a httle the definition of in order to obtain a 
convex function: we set 



au{x) =e l[,„,oo) + e " l[o,.„) 

where s„ = (2/n — 1)^''". 

Since a„ is a non- decreasing function, for all A > 0, relation (fT6i) implies that: 



[17) 



< 



p=0 

E 



n-l 



p=0 



where we have invoked Markov's inequality for the second step. Hence, applying Jensen's 
inequality, for all A > 0, we obtain: 



n-l 



p ( n;; n <j > ^ [ ) < —j^ — Y Yl "p'^ (M^pM)) ■ 



Furthermore, owing to the definition ffTTj) of Qu, 

E(a.(A|e,,^|))<E(e^"^^l^--l"^^)+e2/'^-^ 
for all p > 0, all > 1 and all A > 0. 



Then, applying Lemma [3. 2 [ we have: 



K + e2/"-i 



F\nin'>\RJ>-)' < 



for any A < 'j'^/'^N^/'^. Since M^^ > (1 — p)S/2 and since a„ is a non-decreasing function, 
by choosing A = ^^/"A^^/^, we obtain: 



FlQinl \RJ > - > ] < 



with 71 = (1 - p)72/"/2 > and i^i = + e^/^-i. 



Choose now 5 = A^~", with a < 1/2. Observe that for large enough, 'ji5N^^'^ > Su 
and thus (^ji5N^^'^^ = e"''^^ ^^^^^ a)"/2^ Hence, there exists a finite positive constant K' 
such that for all > 1 and p > 0, 



¥{n:^n{\Rn\> 



2m 



< K'e 



■/ _yJ^(l/2-a)u/2 



[18) 



with 7 = '-fi^'^. 
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Step 3: Conclusion. Putting together (fT3i) . (fT^ . (fT5|) and (fTSl) . choosing 5 = N ^ with 
a < 1/2, we end up with: 

for any n such that n > 1 — a ln(A^/ {2goo — 2x*)) / ln(Mj,*). Choose now n = [Ca In A^] + 1. 
If the following condition holds true: 

lim {n + a \n{N/{2g^ - 2x*)) / ln(M^*)) = +oo 
i.e. if C > — l/ln(M^*), then for Nq large enough, 

n = [CalniV] + 1 > 1 - a ln(A^/(2^oo - 2x*))/ ln(M^*)- 

We thus choose C = -1/ln {Mx*) + 'r] with 77 > 0. Hence, for N > Nq and n = [CalniV] + 
1, we have: 

P {\Un -x*\> N'") < ni^e-^^"''^"'* + i^'e"^ 

Therefore, since (1/2 — a)u/2 < u/4 we have proved that there exists a positive finite 
constant A such that for all G N*, 

which is the desired result. □ 

3.3. Subcritical case: F < Fc. We show in this section that the choice of the constant 
Fc for the peeling algorithm is optimal in the following sense: if one chooses a parameter 
F < Fc, then the threshold sequence converges to with high probability. Specifically, 
we get the following result: 

Proposition 3.5. Consider F < F^. and assume that our signal z satisfies Hypothesis \l.li 
Let N eW, a < 1/2, and n > Qn, where 

Q, = max^l + ; ij . 

Then, there exist A, 7 two positive finite constants (independent of N and n) such that 

P {Un > iV"") < Ae-^^'^'""'"''. (19) 

Proof. In the subcritical case, the following property holds true for the function g = gi^u 
defined by ([8]): there exists a constant n G (0, 1) such that, for all x > 0, < g{x) < kx. 
We thus have the following relation for the noisy dynamics of Un'- 

Iterating this inequality, we have: 

Un < ^""-'Ui + ^'~^£n-,,N- (20) 

i=i 

According to the fact that Ui = g^o + £o,Ni we end up with: 

n 

Un < ^""-'goo + Yl ^'~'^n-j,N, (21) 

a relation which is valid for any n > 1. 
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Consider now a < 1/2 and assume that n > Q^, which ensures ^Qoo < N "/2. 
Then invoking (12T1) . we have 



We are thus back to the setting of the proof of Theorem 13.3^ Step 2. Along the same hues 
as in this proof (changing just the name of the constants there), the reader can now easily 
check inequality (|T9|) . 



Remark 3.6. We have chosen here to investigate the case of a probability P(t/n > A^~") 
and of a logarithmic number of iterations n, in order to be coherent with Theorem 13.31 
However, in the simpler subcritical setting, one could have considered a number of itera- 
tions of order A^, opening the door to a possible almost sure convergence of t/„ to 0. We 
have not entered into those details for sake of conciseness. In the same spirit, we have not 
tried to solve the (much harder) problem of the behavior of our algorithm in the critical 
case F = Fc. 



The previous sections aimed at giving an optimal criterion of convergence for the peel- 
ing algorithm, in terms of the constant Fc, and under the assumption of a signal whose 
wavelets coefficients are distributed according to a generalized Gaussian random distri- 
bution. We now wish to test the algorithm we have produced in terms of denoising 
performances, on an empirical basis. 

To this purpose, we shall compare various peeling algorithms (detailed at Section 14.11 
below) and two traditional wavelet denoising procedures, namely Universal and SURE 
shrinkage (see [6]). The comparison will be held in two types of situations: first we 
consider the benchmark simulated signals proposed in the classical reference [6]. Then we 
move to a medical oriented application, by observing the denoising effect of our algorithms 
on ECG type signals. In both situations, we shall see that peeling algorithms enable a 
good balance between smoothing and preserving the original shape of the noisy signal. 

4.1. Thresholds. Theorem 13.31 and Remark 13.41 induce us to implement the three follow- 
ing procedures: 

(1) The first one exploits only implicitly the peeling approach and can be reduced to a 
hard (or soft) thresholding in ([3]), where Tf is obtained in the three following ways: recall 
that, according to the value of the shape parameter u, we have computed a critical value 
Fc above which the peeling algorithm converges to a non trivial limit (see e.g. Table 1) 
with high probability. We thus consider two supercritical cases, namely Fq^ = 1.05-Fc and 
= 1.15-Fc. In these two cases, we compute r* = (x*)^/^, where x* is the fixed point of 
the function g„^u defined by ([7]), as analyzed at Section [21 We call respectively Tc^os and 
Tc,i5 these two values, which serve as a threshold in ([3]). A third value of the threshold 
is also considered by taking F = Fm in (j7]), where Fm is defined by ([5]), and computing 
then the corresponding threshold Tc,„. This allows a comparison with the older reference 
[T5] . Let us stress the fact that for this first approach, no iterations are performed. 




□ 



4. Denoising algorithms implementation 
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0.5 1 1.5 2 2.5 3 3.5 4 
Shape u 

Figure 3: Final thresholds for the 7 peeling algorithms for u = [0.1 ... 4]. For comparison, 
universal threshold T„ = 4.29 for = 10000. 



(2) The second procedure computes the final thresholds using a fixed number of iterations 
in the peeling algorithm. According to one of the conclusions in Theorem 13.31 we take this 
number of iterations equal to log A^. As in the first approach, 3 thresholds were obtained, 
for F = 1.05-Fc, F = l.lbF^ and F = F^. Theoretically, this implementation yields some 
thresholds Tc,05; ^c,i5 and Tcm which should be close to their respective exact counterparts 
Tcfi5, ^c,i5 and Tcm (within the conditions stated by Theorem 13.31) . 

(3) The third implementation is the one proposed in [13] (fixed point descent with a 
sufficient convergence condition F = Fm)- The resulting threshold will be noted as T^. 

The relations between the 7 thresholds mentioned above are represented at Figure [3] 
for different shape parameters u. The lines represent the theoretical values Tc^os, Tc^ib and 
Tcm, while the shaded zones represent a superposition of the estimated Tc,055 ^c,i5 and 
Tc,m obtained over 100 simulations (generalized gaussian vectors of A^ = 10000 points, 
zero mean and unitary standard deviation). The averaged values of these estimations are 
very close to the theoretical values, which confirms that peeling algorithms implemented 
with logA^ iterations converge to some thresholds very close to the theoretical values 
(see Theorem 13 . 31 again) . Moreover, the fixed point implementation taken from [15] gives 
almost the same final threshold as Tcm (the respective curves and shaded zone are merely 
superposed) and therefore is not figured here. 

4.2. Denoising: simulated signals. To assess the denoising performances of the peel- 
ing algorithm, we used the 4 classical benchmarks proposed in [6j, namely Blocks, Bumps, 
HeaviSme, Doppler (figure S]), with 4 lengths (A^ = [2048,4096,8192,16384]). The 
signals were normalized to have unitary power. Twenty types of zero-mean random 
noise n were generated according to generalized gaussian with shape parameters Un = 
[0.2, 0.4, 0.6 . . . 3.8, 4]. The noise was then scaled to obtain signal to noise ratios SNR = 
[1,2,3], i.e., [ 0, 3, 4.8] decibels. Furthermore, the wavelet decomposition of the signal 
has been performed based on the sym 8 wavelet, and the noise was added to the wavelet 
coefficients of the noise-free signals to obtain the "measured signal" wavelet coefficients 
z. Each of these noisy signals were simulated 500 times to obtain averaged results. A 
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(a) 



(b) 





Figure 4: Benchmark signals: (a) Blocks, (b) Bumps, (c) HeaviSine,, (d) Doppler 

statistical hypothesis testing showed that the wavelet coefficient of the signals under con- 
sideration could be assimilated to generalized Gaussian random variables, with the notable 
exception of the Bumps process. 

The shape parameter Uz of the total signal z, which determines the thresholds of the 
peeling algorithms, was estimated using the absolute empirical moments mi and m2, 
(with nir = 'E[\z\'^], see [IIIIIO]), while the mean fiz and the standard deviation were 
estimated using classical empirical estimators. 

Denoising was performed by soft thresholding (instead of the hard one described by 
equation ([3])) using the 7 algorithms described at Section Wl] as well as the classical Uni- 
versal and SURE thresholding [6], for comparison. More elaborated wavelet denoising 
methods (either based on redundant wavelet transforms or on block approaches [3| fT7lfT8] ) 
were not considered for the comparison, since their nature is different: all the algorithms 
tested in this paper are term-by-term approaches for orthogonal wavelet transform thresh- 
olding. 

The denoised estimate x of the original signal was reconstructed by inverse wavelet 
transform. We recall here that Universal thresholding aims to completely eliminate 
Gaussian noise (and therefore it risks to distort the signal), while SURE thresholding 
estimates the original signal my minimizing the Stein Unbiased Risk Estimator of the 
mean squared error between x and x, assuming also a Gaussian noise (thus basically 
aiming a minimum distortion of the signal, as the peeling algorithms). The denoising 
performance was evaluated using the signal to noise ratio after denoising: 



e: 



N 



[Xit 



x{i)y 



As expected, the results obtained for Tc^^, Tc^i5, fcm and Tm are very similar to those 
obtained by T^fi^, Tc^is, Tcm, so only results of the three latter are detailed hereQ. Synthetic 



These three algorithms are of course much faster than their iterative versions. 
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Figure 5: Signal to noise ratios after denoising as a function of the noise distribution. The 
presented graphs are obtained for noisy signals with S'A^-R=3dB. 



comparisons are presented at Figure [5] for different shape parameters m„ of the noise 
distribution, for all the 4 benchmark signals and for = 4096 (to ease the presentation, 
detailed tables of results are omitted, since the values can be read with enough precision 
on the graphs). An illustrative example on the Block signal is also provided at Figured 

Several interesting observations can be made. Obviously, the noise type (shape pa- 
rameter Un) greatly influences the performances of all algorithms: they are lower for 
heavy-tailed noise distributions, which indicates that this type of noise is more difficultly 
eliminated from measured signals. As one could expect from its development, SURE 
thresholding is the best choice for Gaussian noise, for which it also attains its best per- 
formance (this algorithm continue to have a very good performance for higher Un). The 
Universal thresholding attains its best performance for Laplacian noise u = 1 and it has 
very good results for super-Gaussian noises («„ < 2). On the contrary, its performances 
are the worst for high values of the shape parameter of the noise (except for the very low 
frequency signal HeaviSine, for which all algorithms are similar for sub-Gaussian noises 

Un > 2). 

The peeling algorithms need a more detailed analysis: they are better than SURE for 
super-Gaussian noises, with the theoretical Tc^i5 (or, equivalently, theoretical and 
iterative Tcm, T^c.is and Tm) being slightly better than Tc^o.s- The order between the two 
peeling algorithms tend to change for sub-Gaussian noise (m„ > 2), especially for impulsive 
Blocks and Bumps. To conclude, it seems that for super-Gaussian noises. Universal 
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Figure 6: Denoising example for Blocks: (a) original signal, (b) noisy signal 
(Laplacian noise n„=l, SNR=10dB), (c) Universal (5Ari?rfe„=17.7dB), (d) SURE 

(5iVi?rfen = 16.9dB), (e) T,,05 (5iVi?den = 18.6dB), (f) Te„ (5iVi?rfen=18.5dB) 



thresholding and peeling algorithms are the best choice, while for sub-Gaussian noises the 
results are almost similar, with SURE thresholding having the best performances when 
the noise is almost Gaussian. In all, the peeling algorithm with T^^is works in a satisfying 
way, independently of the shape parameter u. 

4.3. Denoising: real and pseudo-real signals. A last point should be reminded: peel- 
ing algorithms were mainly developed for biomedical applications [U Ej. Therefore, we 
have chosen to evaluate the performance of the newly developed versions on biological 
signals (normal electrocardiogram - EGG and normal background electroencephalogram 
EEG). However, when dealing with real signals for denoising, one is faced with the fol- 
lowing problem: it is impossible to assert that the denoising is accurate when the original 
signal is unknown. It is also very hard to be provided with a non noisy signal which can 
be perturbed artificially. 

In order to cope with this situation, we have chosen to work with a commonly used 
EGG simulator, implemented in Matlab. In this way, one can produce a clean EGG type 
signal, spoil it with an artificial noise, and then try to recover the original signal by some 
denoising procedures. The simulated EEG was generated according to the procedure 
described in [19] (see the url for the Matlab code). 

This is the protocol we have followed for our experiment. Numerical results globally 
confirm those obtained on the benchmark signals, both for the simulated EGG and EEG. 
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Figure 7: Denoising example for a simulated ECG: x original signal, z noisy signal 
(Laplacian noise Un=l, SNR=OdB), xuni Universal thresholding (S'A^-Rde„=4.8dB), Xsure 
Sure thresholding (5'7Vi?den=3.7dB), Xc,o5 T^^^ thresholding (S'A^-Rden=5.9dB) and Xc,i5 
Tc^is thresholding (5'iVi?den=5.3dB). 



They are not reproduced here for sake of conciseness, but an illustrative example is given 
at Figure [71 

Two-dimensional versions of the tested algorithms were applied on real benchmark im- 
ages also (Lena, House, Barbara, Peppers), with similar performances to those obtained 
for the 1-D signals. Therefore, the detailed results are not presented here. 
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